智能论文笔记

TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data

Beichen Zhang , Frank Schilder , Kelly Helm Smith , Michael J. Hayes , Sherri Harms , Tsegaye Tadesse

分类：自然语言处理 | 机器学习

2022-12-07

Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.

translated by 谷歌翻译

Quantitative Assessment of Drought Impacts Using XGBoost based on the Drought Impact Reporter

Beichen Zhang , Fatima K. Abu Salem , Michael J. Hayes , Tsegaye Tadesse

分类：机器学习

2022-11-04

Under climate change, the increasing frequency, intensity, and spatial extent of drought events lead to higher socio-economic costs. However, the relationships between the hydro-meteorological indicators and drought impacts are not identified well yet because of the complexity and data scarcity. In this paper, we proposed a framework based on the extreme gradient model (XGBoost) for Texas to predict multi-category drought impacts and connected a typical drought indicator, Standardized Precipitation Index (SPI), to the text-based impacts from the Drought Impact Reporter (DIR). The preliminary results of this study showed an outstanding performance of the well-trained models to assess drought impacts on agriculture, fire, society & public health, plants & wildlife, as well as relief, response & restrictions in Texas. It also provided a possibility to appraise drought impacts using hydro-meteorological indicators with the proposed framework in the United States, which could help drought risk management by giving additional information and improving the updating frequency of drought impacts. Our interpretation results using the Shapley additive explanation (SHAP) interpretability technique revealed that the rules guiding the predictions of XGBoost comply with domain expertise knowledge around the role that SPI indicators play around drought impacts.

translated by 谷歌翻译

Visible and Near Infrared Image Fusion Based on Texture Information

Guanyu Zhang , Beichen Sun , Yuehan Qi , Yang Liu

分类：计算机视觉

2022-07-22

多传感器融合被广泛用于自动驾驶汽车的环境感知系统。它解决了由环境变化引起的干扰，并使整个驾驶系统更安全，更可靠。在本文中，提出了一种基于纹理信息的新型可见和近红外融合方法，以增强非结构化的环境图像。它针对传统可见和近红外图像融合方法中的工件，信息丢失和噪声问题。首先，通过相对总变化（RTV）计算，可见图像（RGB）的结构信息（RGB）和近红外图像（NIR）作为融合图像的基础层；其次，建立了贝叶斯分类模型来计算噪声重量和可见图像中的噪声信息和噪声信息通过关节双侧滤波器自适应过滤；最后，融合图像是通过颜色空间转换获得的。实验结果表明，所提出的算法可以保留光谱特性和无伪影和颜色失真的可见和近红外图像的独特信息，并且具有良好的鲁棒性以及保留独特的质地。

translated by 谷歌翻译

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

Wayne Xin Zhao , Kun Zhou , Zheng Gong , Beichen Zhang , Yuanhang Zhou , Jing Sha , Zhigang Chen , Shijin Wang , Cong Liu , Ji-Rong Wen

分类：自然语言处理 | 人工智能

2022-06-13

本文旨在通过介绍第一个中国数学预训练的语言模型〜（PLM）来提高机器的数学智能，以有效理解和表示数学问题。与其他标准NLP任务不同，数学文本很难理解，因为它们在问题陈述中涉及数学术语，符号和公式。通常，它需要复杂的数学逻辑和背景知识来解决数学问题。考虑到数学文本的复杂性质，我们设计了一种新的课程预培训方法，用于改善由基本和高级课程组成的数学PLM的学习。特别是，我们首先根据位置偏见的掩盖策略执行令牌级预训练，然后设计基于逻辑的预训练任务，旨在分别恢复改组的句子和公式。最后，我们介绍了一项更加困难的预训练任务，该任务强制执行PLM以检测和纠正其生成的解决方案中的错误。我们对离线评估（包括九个与数学相关的任务）和在线$ A/B $测试进行了广泛的实验。实验结果证明了与许多竞争基线相比，我们的方法的有效性。我们的代码可在：\ textColor {blue} {\ url {https://github.com/rucaibox/jiuzhang}}}中获得。

translated by 谷歌翻译

User-Oriented Robust Reinforcement Learning

Haoyi You , Beichen Yu , Haiming Jin , Zhaoxing Yang , Jiahui Sun

分类：机器学习

2022-02-15

Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Li Zhang , Chris Callison-Burch

分类：自然语言处理

2023-01-03

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.

translated by 谷歌翻译